178 research outputs found

    Learning to Remember Translation History with a Continuous Cache

    Full text link
    Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamically adapt over time. Experiments on multiple domains with different topics and styles show the effectiveness of the proposed approach with negligible impact on the computational cost.Comment: Accepted by TACL 201

    Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction

    Full text link
    We investigate the problem of Chinese Grammatical Error Correction (CGEC) and present a new framework named Tail-to-Tail (\textbf{TtT}) non-autoregressive sequence prediction to address the deep issues hidden in CGEC. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected based on the bidirectional context information, thus we employ a BERT-initialized Transformer Encoder as the backbone model to conduct information modeling and conveying. Considering that only relying on the same position substitution cannot handle the variable-length correction cases, various operations such substitution, deletion, insertion, and local paraphrasing are required jointly. Therefore, a Conditional Random Fields (CRF) layer is stacked on the up tail to conduct non-autoregressive sequence prediction by modeling the token dependencies. Since most tokens are correct and easily to be predicted/conveyed to the target, then the models may suffer from a severe class imbalance issue. To alleviate this problem, focal loss penalty strategies are integrated into the loss functions. Moreover, besides the typical fix-length error correction datasets, we also construct a variable-length corpus to conduct experiments. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure on tasks of error Detection and Correction.Comment: ACL 2021. Code: https://github.com/lipiji/TtT. Fix the results of SpellGCN on Oct.26,202

    hyperdoc2vec: Distributed Representations of Hypertext Documents

    Full text link
    Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss if directly adapted to hyper-documents. In this paper, we propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, along with four criteria characterizing necessary information that hyper-document embedding models should preserve. Systematic comparisons are conducted between hyperdoc2vec and several competitors on two tasks, i.e., paper classification and citation recommendation, in the academic paper domain. Analyses and experiments both validate the superiority of hyperdoc2vec to other models w.r.t. the four criteria.Comment: Accepted to ACL 201

    Microblog Hashtag Generation via Encoding Conversation Contexts

    Full text link
    Automatic hashtag annotation plays an important role in content understanding for microblog posts. To date, progress made in this field has been restricted to phrase selection from limited candidates, or word-level hashtag discovery using topic models. Different from previous work considering hashtags to be inseparable, our work is the first effort to annotate hashtags with a novel sequence generation framework via viewing the hashtag as a short sequence of words. Moreover, to address the data sparsity issue in processing short microblog posts, we propose to jointly model the target posts and the conversation contexts initiated by them with bidirectional attention. Extensive experimental results on two large-scale datasets, newly collected from English Twitter and Chinese Weibo, show that our model significantly outperforms state-of-the-art models based on classification. Further studies demonstrate our ability to effectively generate rare and even unseen hashtags, which is however not possible for most existing methods.Comment: NAACL 2019 (10 pages

    Evaluating Explanation Methods for Neural Machine Translation

    Full text link
    Recently many efforts have been devoted to interpreting the black-box NMT models, but little progress has been made on metrics to evaluate explanation methods. Word Alignment Error Rate can be used as such a metric that matches human understanding, however, it can not measure explanation methods on those target words that are not aligned to any source word. This paper thereby makes an initial attempt to evaluate explanation methods from an alternative viewpoint. To this end, it proposes a principled metric based on fidelity in regard to the predictive behavior of the NMT model. As the exact computation for this metric is intractable, we employ an efficient approach as its approximation. On six standard translation tasks, we quantitatively evaluate several explanation methods in terms of the proposed metric and we reveal some valuable findings for these explanation methods in our experiments.Comment: Accepted to ACL 2020, 9 page

    Segmenting Natural Language Sentences via Lexical Unit Analysis

    Full text link
    In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks. Given a natural language sentence, LUA scores all the valid segmentation candidates and utilizes dynamic programming (DP) to extract the maximum scoring one. LUA enjoys a number of appealing properties such as inherently guaranteeing the predicted segmentation to be valid and facilitating globally optimal training and inference. Besides, the practical time complexity of LUA can be reduced to linear time, which is very efficient. We have conducted extensive experiments on 5 tasks, including syntactic chunking, named entity recognition (NER), slot filling, Chinese word segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets. Our models have achieved the state-of-the-art performances on 13 of them. The results also show that the F1 score of identifying long-length segments is notably improved

    Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory

    Full text link
    For dialogue response generation, traditional generative models generate responses solely from input queries. Such models rely on insufficient information for generating a specific response since a certain query could be answered in multiple ways. Consequentially, those models tend to output generic and dull responses, impeding the generation of informative utterances. Recently, researchers have attempted to fill the information gap by exploiting information retrieval techniques. When generating a response for a current query, similar dialogues retrieved from the entire training data are considered as an additional knowledge source. While this may harvest massive information, the generative models could be overwhelmed, leading to undesirable performance. In this paper, we propose a new framework which exploits retrieval results via a skeleton-then-response paradigm. At first, a skeleton is generated by revising the retrieved responses. Then, a novel generative model uses both the generated skeleton and the original query for response generation. Experimental results show that our approaches significantly improve the diversity and informativeness of the generated responses.Comment: accepted to NAACL201

    Exploiting Sentential Context for Neural Machine Translation

    Full text link
    In this work, we present novel approaches to exploit sentential context for neural machine translation (NMT). Specifically, we first show that a shallow sentential context extracted from the top encoder layer only, can improve translation performance via contextualizing the encoding representations of individual words. Next, we introduce a deep sentential context, which aggregates the sentential context representations from all the internal layers of the encoder to form a more comprehensive context representation. Experimental results on the WMT14 English-to-German and English-to-French benchmarks show that our model consistently improves performance over the strong TRANSFORMER model (Vaswani et al., 2017), demonstrating the necessity and effectiveness of exploiting sentential context for NMT.Comment: Accepted by ACL 201

    Fine-Grained Sentence Functions for Short-Text Conversation

    Full text link
    Sentence function is an important linguistic feature referring to a user's purpose in uttering a specific sentence. The use of sentence function has shown promising results to improve the performance of conversation models. However, there is no large conversation dataset annotated with sentence functions. In this work, we collect a new Short-Text Conversation dataset with manually annotated SEntence FUNctions (STC-Sefun). Classification models are trained on this dataset to (i) recognize the sentence function of new data in a large corpus of short-text conversations; (ii) estimate a proper sentence function of the response given a test query. We later train conversation models conditioned on the sentence functions, including information retrieval-based and neural generative models. Experimental results demonstrate that the use of sentence functions can help improve the quality of the returned responses.Comment: Here is a revised version of our paper accepted by ACL201

    On the Branching Bias of Syntax Extracted from Pre-trained Language Models

    Full text link
    Many efforts have been devoted to extracting constituency trees from pre-trained language models, often proceeding in two stages: feature definition and parsing. However, this kind of methods may suffer from the branching bias issue, which will inflate the performances on languages with the same branch it biases to. In this work, we propose quantitatively measuring the branching bias by comparing the performance gap on a language and its reversed language, which is agnostic to both language models and extracting methods. Furthermore, we analyze the impacts of three factors on the branching bias, namely parsing algorithms, feature definitions, and language models. Experiments show that several existing works exhibit branching biases, and some implementations of these three factors can introduce the branching bias.Comment: EMNLP 2020 finding
    • …
    corecore